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Abstract 

The 16, 299 bp long mitochondrial genome (mitogenome) of a tessaratomid bug, Eusthenes cupreus (Westwood), is 
reported and analyzed. The mitogenome represents the first sequenced complete mitogenome of the heteropteran family 
Tessaratomidae. The mitogenome oiE. cuopreus is a typical circular DNA molecule with a total AT content of 74.1%, 
and contains 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes, and a 
control region. The gene arrangement is identical with the most common type in insects. Most PCGs start with the typical 
ATN codon, except that the initiation codon for COI is TTG. All tRNAs possess the typical clover-leaf structure, except 
tRNA"'"' in which the dihydrouridine (DHU) arm forms a simple loop. Six domains with 45 helices and three domains 
with 27 helices are predicted in the secondary structures of rrnL and rrnS, respectively. The control region is located 
between rrnS and tRNA"‘, including some short microsatellite repeat sequences. In addition, three different repetitive 
sequences are found in the control region and the tRNA"‘-tRNA°'''-tRNA'^‘"-ND2 gene cluster. One of the unusual features 
of this mitogenome is the presence of one tRNA°‘''-\ike sequence in the control region. This extra /RAG°'"-like sequence is 
73 bp long, and the anticodon arm is identical to that of the regular tRNA'"". 
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Introduction 

The mitogenome of insects is a typically double-stranded, circular molecule, and commonly includes 13 PCGs, 22 
tRNA genes, 2 rRNA genes {rrnL and rrnS), and one non-coding region known as the control region which plays a 
role in initiation of transcription and replication (Wolstenholme 1992; Boore 1999). The mitogenome is becoming 
increasingly important for the study of population genetics and molecular evolution and has been widely regarded 
as the molecular maker for the phylogenetic analysis in metazoans because of its relatively simple genetic 
structure, high rate of evolution, low or absence of sequence recombination, and evolutionary conserved gene 
products (Lin et al. 2004; Gissi et al. 2008). In addition, the comparative study of mitogenome sequences can give 
us a better understanding of the genome structure, gene arrangement, and the evolution of arthropod lineages 
(Boore 1999; Shao et al. 2001; Hwang et al. 2001; Nardi et al. 2003). 

Tessaratomidae is a family of true bugs with approximately 240 species and 55 genera (Rolston et al. 1994; 
Rider 2006). All tessaratomids are large to extremely large (often over 20 mm, some longer than 40 mm), robustly 
ovate or elongate, and phytophagous. They generally feed upon plants belonging to the plant orders Rosales and 
Sapindales, and spend most of their lives in tree leaves and stems. Many species are of economic importance as 
agricultural pests, such as the litchi stink bug, Tessaratoma papillosa (Westwood), which are destructive pests of 
litchi trees in China (Cassis & Gordon 2002, http://en.wikipedia.org/wiki/Tessaratomidae - cite_ref-cassisgross_7- 
0#cite ref-cassisgross 7-0). A few species are also consumed as human food in some countries, such as the edible 
stink bug Encosternum delegorguei Spinola, which is a well known food in Zimbabwe and among the Venda 
people of South Africa (Dzerefos et al. 2002). 
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Up to now, mitogenome sequences of Tessaratomidae have not been reported. This lack of mitogenome data 
impedes our extraordinary attention to the evolutionary study in this group. In the current study, we determined the 
complete mitogenome of a fruit tree pest tessaratomid bug, Eusthenes cupreus (Westwood), and analyze the 
mitogenomic architecture, such as nucleotide composition, genomic arrangement, codon usage, and RNA 
secondary structure. 


Materials and methods 

Samples and DNA extraction. Adult specimens of E. cupreus were collected in Yaoqu, Yunnan Province, China, 
in May 2009. All specimens were initially preserved in 95% ethanol in the field, and transferred to -20°C for long¬ 
term storage. The genomic DNA was extracted from one male adult’s muscle tissues of the thorax using a CTAB- 
based protocol (Aljanabi et al. 1997). The voucher specimen (No. VHem-00225) was deposited at the 
Entomological Museum of China Agricultural University (Beijing). 

PCR amplification and sequencing. The complete mitogenome was amplified by overlapping PCR 
fragments (Table 1) using a range of universal insect mitochondrial primers (Simon et al., 2006). Species-specific 
primers were designed based on fhe sequenced fragmenfs fo bridge gaps. PCR and sequencing reacfions were 
conducfed following Li et al. (2012). 

Annotation and bioinformatics analysis. PCGs and rRNAs were initially identified using BLAST searches 
in GenBank and subsequently by alignment with genes of other true bugs. tRNAs were identified by tRNAscan-SE 
Search Server v. 1.21 (Lowe & Eddy 1997). Some tRNA genes that could not be determined by tRNAscan-SE were 
determined in the unannotated regions by sequence similarity to tRNAs of other heteropterans (Dotson & Beard 
2001; Li et al. 2011, 2012a, 2012b). The base composition, codon usage, and nucleotide substitution were analyzed 
with Mega 5.0 (Tamura et al. 2011). AT-skew = (A-T)/(A+T) and GC-skew = (G-C)/(G+C) were used to measure 
the base-compositional difference between genes (Pema & Kocher 1995). 

Construction of secondary structures of RNAs and non-coding regions. Secondary structures of the small 
and large subunits of rRNAs were inferred using models predicted for other insects (Gillespie et al. 2006; Zhou et 
al. 2007; Cameron & Whiting 2008; Li et al. 2011). Stem-loops were named according to the convention of 
(Gillespie et al. 2006; Cameron & Whiting 2008). Regions lacking significant homology and other non-coding 
regions were predicted using Mfold (Zuker 2003). 


Results and discussion 

Genome organization. The complete mitogenome of E. cupreus is a typically double-stranded, circular molecular, 
16, 229 bp in size (GenBank accession number: JQ910983). It contains the typically gene content (13 PCGs, 22 
tRNA genes, two rRNA genes, and a control region) (Fig. 1, Table 2). The orientation and gene order is the same as 
that in Drosophila yakuba (Clary et al. 1985), which has been hypothesized to be the ancestral arrangement for 
insects (Boore 1999). The majority-coding strand (J-strand) encodes 23 genes, and the other 14 genes are oriented 
on the minority-coding strand (N-stand). Gene overlaps are found at 12 locations and involve a total of 79 bp. The 
longest overlap is 44 bp and is located between ATP6 and COIII. In addition, this mitogenome harbors 162 bp of 
intergenic spacer sequences, which are spread over 15 regions, ranging in the size from 1-55 bp. The longest 
intergenic spacer sequence is located between tRNA"'' and tRNA'"'’’. 

Protein-coding genes. The total length of all 13 PCGs is 11, 801 bp, accounting for 68.3% of the entire length 
of E. cupreus mitogenome. The overall AT content of PCGs is 73.6 %. Start codons of most PCGs are ATN, with 
the exception represented by the TTG start codon of ND6 and COI, and the GTG start codon of NDl. The majority 
of PCGs has a complete termination codon of TAA {ND2, ATP8, ND5, ND4, ND4L, ND6, and NDl) and TAG 
{ATP6, ND3, and CytB), while the remaining three PCGs use the incomplete termination codons, TA {COI) and T 
{COII and COIII) (Table 2). The incomplete stop codons are presumably completed by post-transcriptional 
polyadenylation (Ojala et al. 1981). 
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TABLE 1. The primers used in this study. 


No. of fragment’ 

Primer name 

Nucleotide sequence (5'-3') 

Reference 

1 

TI-J34 

C1-N1738 

GCCTGATAAAAAGGRTTAYYTTGATA 

TTTATTCGTGGRAATGCYATRTC 

Simon et ai, 2006 

Simon et ai, 2006 

2 

TW-J1301 

C1-N2776 

GTTAAWTAAACTAATARCCTTCAAA 

GGTAATCAGAGTATCGWCGNGG 

Simon et al, 2006 

Simon et al, 2006 

3 

F2100 

R3000 

AATTGGWGGWTTYGGAAAYTG 

GGTAATCAGAGTATCGWCGNGG 

Simon et ai, 2006 

Simon et al, 2006 

4 

C1-J2756 

C2-N3665 

ACATTTTTTCCTCAACATTT 

CCACAAATTTCTGAACACTG 

Simon et al, 2006 

Simon et al, 2006 

5 

C2-J3399 

A8-N4061 

TCTATTGGTCATCAATGGTACTG 

GAAAATAAATTTGTTATCATTTTCA 

Simon et al, 2006 

Simon et al, 2006 

6 

F4050 

R4740 

GCTCCTTTATGATGAGAAAG 

TCCTAACTTCAATCCTGATG 

Present study 

Present study 

7 

A6-J4463 

C3-N5460 

TTTGCCCATCTWGTWCCNCAAGG 

TCAACAAAATGTCARTAYCA 

Simon et al, 2006 

Simon et al, 2006 

8 

C3-J4792 

N3-N5731 

GTTGATTATAGACCWTGRCC 

TTAGGGTCAAATCCRCAYTC 

Simon et al, 2006 

Simon et al, 2006 

9 

C3-J5470 

TN-N6160 

GCAGCTGCYTGATAYTGRCA 

TCAATTTTRTCATTAACAGTGA 

Simon et al, 2006 

Simon et al, 2006 

10 

F5670 

R7470 

TGAGGTAGATATCCTTTTAG 

ATCTTGTTGGGTTGAGATGG 

Present study 

Present study 

11 

N5-J7572 

N4-N8727 

AAAGGGAATTTGAGCTCTTTT 

AAATCTTTRATTGCTTATTCWTC 

Simon et al, 2006 

Simon et al, 2006 

12 

F8530 

R10460 

CTACGACTATGAGTACGTTC 

CCGTTTGCATGGGTATATCG 

Present study 

Present study 

13 

CB-J10621 

CB-N11526 

CTCATACTGATGAAATTTTGGTTC 

TTCTACTGGTCGTGCTCCAATTCA 

Simon et al, 2006 

Simon et al, 2006 

14 

CB-J11335 

N1-N12067 

CATATTCAACCAGAATGATA 

AATCGTTCTCCATTTGATTTTGC 

Simon et al, 2006 

Simon et al, 2006 

15 

F12030 

R12261 

AAGAAGTAAGATCACATCCC 

GTGGTGCTTGTATCCTTATG 

Present study 

Present study 

16 

N1-J12261 

LR-N13000 

TACCTCATAAGAAATAGTTTGAGC 

TTACCTTAGGGATAACAGCGTAA 

Simon et al, 2006 

Simon et al, 2006 

17 

LR-J12888 

LR-N13889 

CCGGTTTGAACTCARATCATGTAA 

ATTTATTGTACCTTKTGTATCAG 

Simon et al, 2006 

Simon et al, 2006 

18 

SR-J13342 

SR-N14220 

CCTTCGCACRGTCAAAATACYGC 

ATATGYACAYATCGCCCGTC 

Simon et al, 2006 

Simon et al, 2006 

19 

LR-J14197 

SR-N 14745 

GTAAAYCTACTTTGTTACGACTT 

GTGCCAGCAAYCGCGGTTATAC 

Simon et al, 2006 

Simon et al, 2006 

20 

SR-J14610 

TM-N200 

ATAATAGGGTATCTAATCCTAGT 

ACCTTTATAARTGGGGTATGARCC 

Simon etal, 2006 

Simon et al, 2006 

’The orientation is as shown in Fig. 1. 
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TABLE 2. Organization of is. cupreus mitogenome. 


Gene 

Direetion 

Loeation 

Size 

(bp) 

Antieodon 

Codon 

Start 

Stop 

Intergenic 

nueleotides * 

tRNA“‘ 

F 

1-66 

66 

32-34 GAT 




tRNA°‘" 

R 

122-191 

69 

158-160 TTG 



55 

tRNA^“ 

F 

206-273 

68 

236-238 CAT 



14 

ND2 

F 

274-1257 

984 


ATA 

TAA 

0 

tRNA^'” 

F 

1256-1324 

69 

1289-1291 TCA 



-2 

tRNA'^^‘ 

R 

1317-1380 

64 

1348-1350 GCA 



-8 

tRNA^>”' 

R 

1388-1451 

64 

1418-1420 GTA 



7 

COI 

F 

1456-2990 

1535 


TTG 

TA- 

4 


F 

2991-3057 

67 

3020-3022 TAA 



0 

con 

F 

3058-3736 

679 


ATA 

T- 

0 

tRNA^^' 

F 

3737-3810 

74 

3771-3773 CTT 



0 

tRNA^'^” 

F 

3814-3878 

65 

3844-3846 GTC 



3 

ATP8 

F 

3885^043 

159 


ATG 

TAA 

6 

ATP 6 

F 

4037^747 

711 


ATG 

TAG 

-7 

com 

F 

4704-5490 

787 


ATG 

T- 

-44 

tRNA°'^ 

F 

5491-5555 

65 

5521-5523 TCC 



-3 

ND3 

F 

5553-5909 

357 


ATA 

TAG 

14 

tRNA^'" 

F 

5924-5992 

69 

5952-5954 TGC 



-1 

tRNA^'^ 

F 

5994-6057 

64 

6023-6025 TCG 



1 

tRNA^" 

F 

6070-6139 

70 

6102-6104 GTT 



12 


F 

6139-6211 

73 

6169-6171 GCT 



-1 

tRNA°‘“ 

F 

6211-6275 

65 

6242-6244 TTC 



-1 

tRNA‘‘''‘ 

R 

6274-6339 

66 

6306-6308 GAA 



-2 

ND5 

R 

6339-8051 

1713 


ATA 

TAA 

-1 

tRNA"‘’‘ 

R 

8053-8114 

62 

8082-8084 GTG 



1 

ND4 

R 

8118-9443 

1326 


ATG 

TAA 

3 

ND4L 

R 

9437-9718 

282 


ATT 

TAA 

-7 

tRNA^' 

F 

9730-9795 

66 

9760-9762 TGT 



11 

tRNA’’"' 

R 

9796-9858 

63 

9826-9828 TGG 



0 

ND6 

F 

9872-10357 

486 


TTG 

TAA 

13 

CytB 

F 

10359-11495 

1137 


ATG 

TAG 

1 


F 

11494-11562 

69 

11524-11526 TGA 



-2 

NDl 

R 

11580-12506 

927 


GTG 

TAA 

17 


R 

12507-12573 

67 

12540-12542 TAG 



0 

IrRNA 

R 

12574-13849 

1276 




0 

tRNA'"" 

R 

13850-13916 

67 

13886-13888 TAC 



0 

srRNA 

R 

13917-14709 

793 




0 

Control region 


14710-16229 

1520 




0 

* Negative numbers indieate that adjaeent genes overlap. 
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FIGURE 1. Map of the mitogenome of £. cupreus. The tRNAs are denoted by the eolor bloeks and are labeled aeeording to the 
lUPAC-IUB single-letter amino aeid eodes. Gene name without underline indieates the direction of transcription from left to 
right, and with underline indicates right to left. Overlapping lines within the circle denote PCR fragments amplified used for 
cloning and sequencing. 

tRNA and rRNA genes. Twenty-two tRNAs are found in E. cupreus mitogenome, ranging in size from 62 to 
74 bp. All tRNAs can be folded into the typical clover-leaf structure, with the exception of the 

dihydrouridine (DHU) arm of which forms a simple loop (14 bp) (Fig. 2). This phenomenon is common in the 
mitogenomes of true bugs (Li etal. 2012a, 2012b), and is also considered as a typical feature in metazoan mtDNAs 
(Lavrov et al. 2004). In most tRNAs, the amino acid acceptor (AA) arms and the anticodon (AC) arms are more 
conservative, except for which possessed a long optimal base pairing (9 bp in contrast to the normal 5) 

and a bulged nucleotide in the middle for the AC stem. The lengths of the DHU and TT^C (T) stems (2-5 bp) and 
loops (3-9 bp) are more variable (Fig. 2). 

Based on the secondary structure, a total of 17 unmatched base pairs are found in the E. cupreus tRNAs. 
Unmatched base pairs are present in stems of 10 tRNAs, including 13 bp G-U, 3 bp U-U, and 1 bp A-C 
mismatches. A post-transcriptional RNA editing mechanism has been proposed to correct the errors like the 
mismatches, a bulged nucleotide (U) in the stem, abnormal loops and arms, and maintain the function of these 
tRNAs (Tomita et al. 2001; Lavrov et al. 2004; Zhang et al. 2008). 


264 ■ Zootaxa 3620 (2) © 2013 Magnolia Press 


SONG ETAL. 






















Alanine 

(A) 



Asparagine 
(N) 







Discriminator nucleotide 
Acceptor stem 

*-T *0 arm 


Variable loop 
Anticodon arm 


FIGURE 2. Inferred seeondary strueture of 22 tRNAs of the E. cupreus mitogenome. The tRNAs are labeled with the 
abbreviations of their eorresponding amino acids (gray indicates the mismatches). Dashed (-) indicate Watson-Crick base 
pairing and (•) indicate G-U base pairing. 
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FIGURE 3. Predicted secondary structure of the rriiL gene in E. cupreus. Roman numerals denote the conserved domain 
structure. The numbering system follows Gillespie et al. (2006) (established at the Comparative RNA Website). Dashed (-) 
indicate Watson-Crick base pairing and dot (•) indicate G-U base pairing. 

The large and small ribosomal RNAs are located respectively between and tRNA''"', tRNA''‘‘‘, and 

control region (Fig. 1). The length of the rrnL is 1, 276 bp, and the rrnS is 793 bp. The secondary structure of both 
rmL and rmS can be predicted by the models of other insects (Cannone et al. 2002; Gillespie et al. 2006; Zhou et 
al. 2007; Cameron & Whiting 2008). The secondary structure of rrnL consists of six structural domains and 45 
helices (Fig. 3), and the rrnS includes three structural domains and 27 helices (Fig. 4). 


TABLE3. The nucleotide composition ofE. cupreus mitogenome. 


Feature 

T(U) 

C 

A 

G 

AT% 

AT Skew 

GC Skew 

Whole genome 

32.1 

14.3 

42.0 

11.6 

74.1 

0.133 

-0.103 

Protein-coding genes 

40.6 

13.1 

33.0 

13.3 

73.6 

-0.103 

0.010 

First codon position 

33 

12.0 

34.8 

20.1 

67.9 

0.027 

0.250 

Second codon position 

46 

19.0 

20.4 

14.9 

66.1 

-0.382 

-0.119 

Third codon position 

43 

8.2 

43.7 

5.0 

86.7 

0.008 

-0.241 

Protein-coding genes J-strand 

35.1 

14.5 

37.3 

13.2 

72.3 

0.031 

-0.049 

Protein-coding genes N-strand 

49.5 

10.8 

26.1 

13.6 

75.6 

-0.309 

0.117 

tRNA genes 

36.7 

11.3 

37.7 

14.3 

74.4 

0.013 

0.120 

tRNA genes J-strand 

34.6 

12.8 

39.8 

12.9 

74.4 

0.069 

0.005 

tRNA genes N-strand 

49.5 

10.8 

26.1 

13.6 

75.6 

-0.309 

0.117 

rRNA genes 

31.5 

14.6 

44.5 

9.5 

75.9 

0.171 

-0.213 

Control region 

33.6 

14.8 

41.1 

10.5 

74.7 

0.101 

-0.169 
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Nucleotide composition and codon usage. The mitogenome of E. cupreus is typically biased toward A and T 
with an average AT content of 74.1% (A = 42.0%, T = 32.1%, C = 14.3 %, G = 11.6%) (Table 3). The AT content 
of PCGs, tRNAs, and rRNAs is 74.7%, 75.6%, and 76.2%, respectively. Asymmetry in the nucleotide composition 
between J-strand and N-strand was observed in this mitogenome and was a common phenomenon in the Metazoa 
(Pema & Kocher 1995). The PCGs and tRNAs in J-strand displayed positive AT-skew and obviously negative GC- 
skew, whereas the N-strand showed negative AT-skew and nearly equal G and C. This feature was probably related 
to the asymmetrical directional mutation pressure (Min & Hickey 2007). 

The AT bias is also reflected in the codon usage. Analysis of base composition at each codon position of the 
concatenated 13 PCGs showed that the AT content of third codon position (81.2%) is higher than the first (68.5%) 
and second (66.3%) codon positions (Table 3). NNA and NNC codons are more frequent than NNU and NNG in 
the J-strand PCGs, whereas the N-strand genes are exactly the opposite. In addition, the most frequently used 
codons are the AT-rich codons, such as TTA, ATT, ATA, TTT, AAT, and TAT (Table 4). 
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FIGURE 4. Predicted secondary structure of the rrnS gene in the E. cupreus. Roman numerals denote the conserved domain 
structure. Dashed (-) indicate Watson-Crick base pairing and dot (•) indicate G-U base pairing. Structural annotations follow 
Fig. 3. 
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TABLE 4. Codon usage of is. cupreus mitochondrial PCGs. 


Amino acid 

Codon 

N 

RSCU 

N+ 

RSCU 

N- 

RSCU 

Phe (F) 

UUU 

243 

1.59 

103 

1.36 

140 

1.83 


UUC 

62 

0.41 

49 

0.64 

13 

0.17 

Leu (L) 

UUA 

350 

4.19 

205 

4.44 

145 

3.88 


UUG 

34 

0.41 

9 

0.19 

25 

0.67 


CUU 

59 

0.71 

20 

0.43 

39 

1.04 


cue 

5 

0.06 

1 

0.02 

4 

0.11 


CUA 

47 

0.56 

40 

0.87 

7 

0.19 


CUG 

6 

0.07 

2 

0.04 

4 

0.11 

He (I) 

AUU 

312 

1.68 

191 

1.61 

121 

1.81 


AUC 

59 

0.32 

46 

0.39 

13 

0.19 

Met (M) 

AUA 

268 

1.76 

196 

1.84 

72 

1.58 


AUG 

36 

0.24 

17 

0.16 

19 

0.42 

Val (V) 

GUU 

85 

1.65 

25 

0.83 

60 

2.82 


GUC 

12 

0.23 

10 

0.33 

2 

0.09 


GUA 

100 

1.94 

81 

2.68 

19 

0.89 


GUG 

9 

0.17 

5 

0.17 

4 

0.19 

Ser (S) 

UCU 

95 

2.17 

26 

1.04 

69 

3.66 


UCC 

14 

0.32 

8 

0.32 

6 

0.32 


UCA 

99 

2.26 

71 

2.84 

28 

1.48 


UCG 

2 

0.05 

2 

0.08 

0 

0 

Pro (P) 

ecu 

68 

1.97 

42 

1.66 

26 

2.81 


CCC 

21 

0.61 

13 

0.51 

8 

0.86 


CCA 

47 

1.36 

44 

1.74 

3 

0.32 


CCG 

2 

0.06 

2 

0.08 

0 

0 

Thr (T) 

ACU 

80 

1.66 

46 

1.29 

34 

2.72 


ACC 

23 

0.48 

19 

0.53 

4 

0.32 


ACA 

87 

1.8 

76 

2.13 

11 

0.88 


ACG 

3 

0.06 

2 

0.06 

1 

0.08 

Ala (A) 

GCU 

69 

1.74 

32 

1.19 

37 

2.9 


GCC 

11 

0.28 

9 

0.33 

2 

0.16 


GCA 

78 

1.96 

67 

2.48 

11 

0.86 


GCG 

1 

0.03 

0 

0 

1 

0.08 

Tyr (Y) 

UAU 

154 

1.78 

65 

1.76 

89 

1.8 


UAC 

19 

0.22 

9 

0.24 

10 

0.2 

Stop (*) 

UAA 

7 

1.4 

3 

0.5 

4 

2 


UAG 

3 

0.6 

3 

0.5 

0 

0 

His (H) 

CAU 

50 

1.37 

37 

1.23 

13 

2 


CAC 

23 

0.63 

23 

0.77 

0 

0 

Gin (Q) 

CAA 

52 

1.68 

41 

1.82 

11 

1.29 


CAG 

10 

0.32 

4 

0.18 

6 

0.71 

Asn (N) 

AAU 

155 

1.74 

94 

1.63 

61 

1.94 


AAC 

23 

0.26 

21 

0.37 

2 

0.06 

. continued on the next page 
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TABLE 4. (Continued) 


Amino acid 

Codon 

N 

RSCU 

N+ 

RSCU 

N- 

RSCU 

Lys (K) 

AAA 

76 

1.52 

59 

1.71 

17 

1.1 


AAG 

24 

0.48 

10 

0.29 

14 

0.9 

Asp (D) 

GAU 

62 

1.72 

37 

1.61 

25 

1.92 


GAG 

10 

0.28 

9 

0.39 

1 

0.08 

Glu (E) 

GAA 

69 

1.59 

52 

1.76 

17 

1.21 


GAG 

18 

0.41 

7 

0.24 

11 

0.79 

Cys (C) 

UGU 

31 

1.59 

8 

1.23 

23 

1.77 


UGC 

8 

0.41 

5 

0.77 

3 

0.23 

Tip (W) 

UGA 

96 

1.94 

68 

2 

28 

1.81 


UGG 

3 

0.06 

0 

0 

3 

0.19 

Arg(R) 

CGU 

17 

1.24 

5 

0.57 

12 

2.4 


CGC 

3 

0.22 

1 

0.11 

2 

0.4 


CGA 

28 

2.04 

24 

2.74 

4 

0.8 


CGG 

7 

0.51 

5 

0.57 

2 

0.4 

Ser (S) 

AGU 

40 

0.91 

15 

0.6 

25 

1.32 


AGC 

3 

0.07 

3 

0.12 

0 

0 


AGA 

98 

2.23 

75 

3 

23 

1.22 


AGG 

0 

0 

0 

0 

0 

0 

Gly (G) 

GGU 

69 

1.27 

15 

0.44 

54 

2.63 


GGC 

8 

0.15 

2 

0.06 

6 

0.29 


GGA 

112 

2.06 

100 

2.96 

12 

0.59 


GGG 

28 

0.52 

18 

0.53 

10 

0.49 


N Total number in all PCGs, N+ total number in J-strand, N- total number in N-strand, RSCU relative synonymous codon 
usage. 


Control region. The control region of E. cupreus is 1, 520 bp long, locates between rrnS and tRNA''\ and 
contains 74.7% AT nucleotides (Fig. 5A). The stem-loop structures in the control region have been suggested as the 
site of the initiation of secondary strand-replication (Zhang et al. 1995; Clary & Wolstenholme 1987), and 
sequences flanking the stem-and-loop structure are highly conserved among several insect orders, possessing 
“TATA” consensus sequences at the 5' end and “G(A)nT” consensus sequences at the 3' end (Schultheis et al. 2002; 
Zhang & Flewitt 1997). Within the control region of E. cupreus mitogenome, three sequence stretches have the 
potential to form stem-loop structures (Fig. 5B). 

The presence of tandem repeats in the control region has been frequently reported in many other insects and 
replication slippage is regarded as a dominant mechanism accounting for the existence of tandem repeats (Oliveira 
et al. 2008; Ojala et al. 1981). Flowever, such tandem repeats are not found in the control region of E. cupreus 
mitogenome. In this control region, some short repeats are present, which can be considered as microsatellite such 
as (TTAATAATAATA)^, (AATAATTTATT)^, (TTAATAAATAAAA)^, (TA),, and (AT)^. 

In addition, three different repetitive sequences are found in the control region and tRNA''‘’-tRNA°'"-tRNA^“- 
ND2 gene cluster (Fig. 5A). First repetitive sequences have two 52 bp long repeat units; one repeat unit is located at 
the control region (15, 464-15, 515), and the other is found between tRNA“‘ and tRNA‘^'" (66-117). Second 
repetitive sequences have two 26 bp long repeat units; one repeat unit is located at the downstream of the first 
repetitive sequences in the control region (15, 535-15, 560), and the other is the partial sequence of tRNA°'" 
(143-168). Third repetitive sequences have two 80 bp long repeat units; one repeat unit is located at the 
downstream of the second repetitive sequences in the control region (15, 606-15, 685), and the other is the partial 
sequence of ND2 (318-397). 
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One of the unusual features of E. cupreus mitogenome is the presence of one tiW^®”-like sequence in the 
control region (Fig. 5C). The presence of tRNA-like sequences within the control region has also been reported in 
other insects (Cha et al. 2007; Flong et al. 2008, 2009; Kim et al. 2009, 2010, 2011), however, are rarely observed 
in Fleteroptera. This extra t7?A/4®"-like sequence is 73 bp long (15, 512-15, 584), and the anticodon arm is identical 
to that of the regular tRNA^'" in E. cupreus mitogenome. The nucleotides show 56% identities between two tRNA 
genes, and the variation is mainly located at the AA arm. The control region is an apparent non-coding region in 
vertebrate mtDNA, and has been shown in the replication origin for the heavy strand of mammalian mtDNA. The 
tRNA-like sequences in the control region may be remnant in the nascent DNA strand after serving as primers 
(Cantatore et al. 1987; Wan et al. 2012). 


Control region 1,520 bp 

A - 


tRNA"'^ -tRNA^'" -tRNA^''-ND2 gene cluster 

-► 


(SrRN^ 



Extra tRISA^’”-\\kc 
sequence 


Repeat I 5* AAA I AAfAAAn 1 A ri AUA IGGCiAlKiCXCCCACCrrrCACA rn AATAAAT y 
Rt-| :it : 5’TATCTCCTTAAAGATCAAAACrrrAT 1' 

Repeat 3 s* AACCATCCTTACAATAAOATCATCTAACTCAATTAGCATATOAATAGGArrA 
GAGCTTAACATAATATCATTTATTCCAT J* 


B 


A - T 
T - A 
T - A 
T - A 


O^AGTAA 



OOOt 

A A T 


Extra sequence 


Regular tRNA 5 ’ 
tRNA-like sequence 5’ 


S' 0000 ^- 1 * O T T A y 5 * 0000 a • T A @00 3 ’ 5 ' A c 


tWa 

A T 


A-yr 

tV A 

r-ii. 0O0O3' 


Stem-loop 1 


Stem-loop 2 


Stem-loop 3 


mmA 3’ 
HIT y 


FIGURE 5. Control region of the E. cupreus mitogenome. (A) Strueture elements found in the control region of E. cupreus. 
The red, yellow and green box represent three different repetitive sequences found in the control region and tRNA"‘ -tRNA'^‘“ - 
tRNA'^“ -ND2 gene cluste. (B) The putative stem-loops structure was found in the control region. The light green and pink 
indicates highly conserved flanking sequence. (C) Predicted secondary clover-leaf stmeture of one extra ?R/VT“"-like sequences 
in the control region (gray indicates the sequences are identical to the typical tRNA^'"; the light green indicates the different 
sequences). Alignments with the corresponding regular tRNA‘^‘" sequences also provided. The boxed nucleotides indicate the 
anticodon, which designates the corresponding tRNA. 
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